Down the rabbit hole with Tensorflow

In this seminar, we're going to play with Tensorflow and see how it helps you build deep learning models.

If you're running this notebook outside the course environment, you'll need to install Tensorflow v1:

  • pip install tensorflow==1.15.2 should install CPU-only TF on Linux & Mac OS
  • If you want GPU support from the onset, see the TF install page. pip install tensorflow-gpu==1.15.2 might or might not work.

In [ ]:
import sys, os
if 'google.colab' in sys.modules:
    %tensorflow_version 1.x
    
    if not os.path.exists('.setup_complete'):
        !wget -q https://raw.githubusercontent.com/yandexdataschool/Practical_RL/spring20/setup_colab.sh -O- | bash

        !wget -q https://raw.githubusercontent.com/yandexdataschool/Practical_RL/spring20/week04_[recap]_deep_learning/mnist.py

        !touch .setup_complete

# This code creates a virtual display to draw game images on.
# It will have no effect if your machine has a monitor.
if type(os.environ.get("DISPLAY")) is not str or len(os.environ.get("DISPLAY")) == 0:
    !bash ../xvfb start
    os.environ['DISPLAY'] = ':1'

In [ ]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [ ]:
import tensorflow as tf

# session is main tensorflow object. You ask session to compute stuff for you.
sess = tf.InteractiveSession()

Warming up

For starters, let's implement a python function that computes the sum of squares of numbers from 0 to N-1.

  • Use numpy or python
  • An array of numbers 0 to N - numpy.arange(N)

In [ ]:
def sum_squares(N):
    return <YOUR CODE: student.implement_me()>

In [ ]:
%%time
sum_squares(10**8)

Same with tensorflow


In [ ]:
# "i will insert N here later"
N = tf.placeholder('int64', name="input_to_your_function")

# a recipe on how to produce {sum of squares of arange of N} given N
result = tf.reduce_sum((tf.range(N)**2))

In [ ]:
%%time

# dear session, compute the result please. Here's your N.
print(sess.run(result, {N: 10**8}))

# hint: run it several times to let tensorflow "warm up"

How it works: computation graphs

  1. create placeholders for future inputs;
  2. define symbolic graph: a recipe for mathematical transformation of those placeholders;
  3. compute outputs of your graph with particular values for each placeholder
    • sess.run(outputs, {placeholder1:value1, placeholder2:value2})
    • OR output.eval({placeholder:value})

Still confused? We gonna fix that.

Placeholders and constants


In [ ]:
# placeholder that can be arbitrary float32 scalar, vertor, matrix, etc.
arbitrary_input = tf.placeholder('float32')

# input vector of arbitrary length
input_vector = tf.placeholder('float32', shape=(None,))

# input vector that _must_ have 10 elements and integer type
fixed_vector = tf.placeholder('int32', shape=(10,))

# you can generally use None whenever you don't need a specific shape
input1 = tf.placeholder('float64', shape=(None, 100, None))
input2 = tf.placeholder('int32', shape=(None, None, 3, 224, 224))

You can create new tensors with arbitrary operations on placeholders, constants and other tensors.

  • tf.reduce_sum(tf.arange(N)**2) are 3 sequential transformations of placeholder N
  • there's a tensorflow symbolic version for every numpy function
    • a + b, a / b, a ** b, ... behave just like in numpy
    • np.zeros -> tf.zeros
    • np.sin -> tf.sin
    • np.mean -> tf.reduce_mean
    • np.arange -> tf.range

There are tons of other stuff in tensorflow, see the docs or learn as you go with shift+tab.


In [ ]:
# elementwise multiplication
double_the_vector = input_vector * 2

# elementwise cosine
elementwise_cosine = tf.cos(input_vector)

# elementwise difference between squared vector and it's means - with some random salt
vector_squares = input_vector ** 2 - \
    tf.reduce_mean(input_vector) + tf.random_normal(tf.shape(input_vector))

Practice 1: polar pretzels

inspired by this post

There are some simple mathematical functions with cool plots. For one, consider this:

$$ x(t) = t - 1.5 * cos( 15 t) $$$$ y(t) = t - 1.5 * sin( 16 t) $$

In [ ]:
t = tf.placeholder('float32')


# compute x(t) and y(t) as defined above.
x = <YOUR CODE>
y = <YOUR CODE>


x_points, y_points = sess.run([x, y], {t: np.linspace(-10, 10, num=10000)})
plt.plot(x_points, y_points)

Visualizing graphs with Tensorboard

It's often useful to visualize the computation graph when debugging or optimizing. Interactive visualization is where tensorflow really shines as compared to other frameworks.

There's a special instrument for that, called Tensorboard. You can launch it from console:

tensorboard --logdir=/tmp/tboard --port=7007

If you're pathologically afraid of consoles, try this:

import os; os.system("tensorboard --logdir=/tmp/tboard --port=7007 &")

(but don't tell anyone we taught you that)

One basic functionality of tensorboard is drawing graphs. One you've run the cell above, go to localhost:7007 in your browser and switch to graphs tab in the topbar.

Here's what you should see:

Tensorboard also allows you to draw graphs (e.g. learning curves), record images & audio and play flash games. This is useful when monitoring learning progress and catching some training issues.

One researcher said:

If you spent last four hours of your worktime watching as your algorithm prints numbers and draws figures, you're probably doing deep learning wrong.

You can read more on tensorboard usage here

Practice 2: mean squared error


In [ ]:
# Quest #1 - implement a function that computes a mean squared error of two input vectors
# Your function has to take 2 vectors and return a single number

<YOUR CODE: student.define_inputs_and_transformations()>

mse = <YOUR CODE: student.define_transformation()>

compute_mse = lambda vector1, vector2: sess.run( <YOUR CODE: how to run your graph?> , {})

In [ ]:
# Tests
from sklearn.metrics import mean_squared_error

for n in [1, 5, 10, 10 ** 3]:

    elems = [np.arange(n), np.arange(n, 0, -1), np.zeros(n),
             np.ones(n), np.random.random(n), np.random.randint(100, size=n)]

    for el in elems:
        for el_2 in elems:
            true_mse = np.array(mean_squared_error(el, el_2))
            my_mse = compute_mse(el, el_2)
            if not np.allclose(true_mse, my_mse):
                print('Wrong result:')
                print('mse(%s,%s)' % (el, el_2))
                print("should be: %f, but your function returned %f" %
                      (true_mse, my_mse))
                raise ValueError, "Что-то не так"

print("All tests passed")

Tensorflow variables

The inputs and transformations have no value outside function call. That's a bit unnatural if you want your model to have parameters (e.g. network weights) that are always present, but can change their value over time.

Tensorflow solves this with tf.Variable objects.

  • You can assign variable a value at any time in your graph
  • Unlike placeholders, there's no need to explicitly pass values to variables when sess.run(...)-ing
  • You can use variables the same way you use transformations

In [ ]:
# creating shared variable
shared_vector_1 = tf.Variable(initial_value=np.ones(5))

# initialize all variables with initial values
sess.run(tf.global_variables_initializer())

In [ ]:
# evaluating shared variable (outside symbolicd graph)
print("initial value", sess.run(shared_vector_1))

# within symbolic graph you use them just as any other inout or transformation, not "get value" needed

In [ ]:
# setting new value manually
sess.run(shared_vector_1.assign(np.arange(5)))

# getting that new value
print("new value", sess.run(shared_vector_1))

tf.gradients - why graphs matter

  • Tensorflow can compute derivatives and gradients automatically using the computation graph
  • Gradients are computed as a product of elementary derivatives via chain rule:
$$ {\partial f(g(x)) \over \partial x} = {\partial f(g(x)) \over \partial g(x)}\cdot {\partial g(x) \over \partial x} $$

It can get you the derivative of any graph as long as it knows how to differentiate elementary operations


In [ ]:
my_scalar = tf.placeholder('float32')

scalar_squared = my_scalar ** 2

# a derivative of scalar_squared by my_scalar
derivative = tf.gradients(scalar_squared, [my_scalar])[0]

In [ ]:
x = np.linspace(-3, 3)
x_squared, x_squared_der = sess.run(
    [scalar_squared, derivative], {my_scalar: x})

plt.plot(x, x_squared, label="x^2")
plt.plot(x, x_squared_der, label="derivative")
plt.legend()

Why autograd is cool


In [ ]:
my_vector = tf.placeholder('float32', [None])

# Compute the gradient of the next weird function over my_scalar and my_vector
# warning! Trying to understand the meaning of that function may result in permanent brain damage

weird_psychotic_function = tf.reduce_mean((my_vector+my_scalar)**(1+tf.nn.moments(my_vector, [0])[1]) + 1. / tf.atan(my_scalar))/(my_scalar**2 + 1) + 0.01*tf.sin(
    2*my_scalar**1.5)*(tf.reduce_sum(my_vector) * my_scalar**2)*tf.exp((my_scalar-4)**2)/(1+tf.exp((my_scalar-4)**2))*(1.-(tf.exp(-(my_scalar-4)**2))/(1+tf.exp(-(my_scalar-4)**2)))**2

der_by_scalar = <YOUR CODE: student.compute_grad_over_scalar()>
der_by_vector = <YOUR CODE: student.compute_grad_over_vector()>

In [ ]:
# Plotting your derivative
scalar_space = np.linspace(1, 7, 100)

y = [
    sess.run(weird_psychotic_function, {my_scalar: x, my_vector: [1, 2, 3]})
    for x in scalar_space]

plt.plot(scalar_space, y, label='function')

y_der_by_scalar = [
    sess.run(der_by_scalar, {my_scalar: x, my_vector: [1, 2, 3]})
    for x in scalar_space]

plt.plot(scalar_space, y_der_by_scalar, label='derivative')
plt.grid()
plt.legend()

Almost done - optimizers

While you can perform gradient descent by hand with automatic grads from above, tensorflow also has some optimization methods implemented for you. Recall momentum & rmsprop?


In [ ]:
y_guess = tf.Variable(np.zeros(2, dtype='float32'))
y_true = tf.range(1, 3, dtype='float32')

loss = tf.reduce_mean((y_guess - y_true + tf.random_normal([2]))**2)

optimizer = tf.train.MomentumOptimizer(
    0.01, 0.9).minimize(loss, var_list=y_guess)

# same, but more detailed:
# updates = [[tf.gradients(loss,y_guess)[0], y_guess]]
# optimizer = tf.train.MomentumOptimizer(0.01,0.9).apply_gradients(updates)

In [ ]:
from IPython.display import clear_output

sess.run(tf.global_variables_initializer())

guesses = [sess.run(y_guess)]

for _ in range(100):
    sess.run(optimizer)
    guesses.append(sess.run(y_guess))

    clear_output(True)
    plt.plot(*zip(*guesses), marker='.')
    plt.scatter(*sess.run(y_true), c='red')
    plt.show()

Logistic regression example

Implement the regular logistic regression training algorithm

We shall train on a two-class MNIST dataset.

This is a binary classification problem, so we'll train a Logistic Regression with sigmoid. $$P(y_i | X_i) = \sigma(W \cdot X_i + b) ={ 1 \over {1+e^{- [W \cdot X_i + b]}} }$$

The natural choice of loss function is to use binary crossentropy (aka logloss, negative llh): $$ L = {1 \over N} \underset{X_i,y_i} \sum - [ y_i \cdot log P(y_i | X_i) + (1-y_i) \cdot log (1-P(y_i | X_i)) ]$$

Mind the minus :)


In [ ]:
from sklearn.datasets import load_digits
X, y = load_digits(2, return_X_y=True)

print("y [shape - %s]:" % (str(y.shape)), y[:10])
print("X [shape - %s]:" % (str(X.shape)))

In [ ]:
print('X:\n', X[:3, :10])
print('y:\n', y[:10])
plt.imshow(X[0].reshape([8, 8]))

In [ ]:
# inputs and shareds
weights = <YOUR CODE: student.create_variable()>
input_X = <YOUR CODE: student.create_placeholder_matrix()>
input_y = <YOUR CODE: student.code_placeholder_vector()>

In [ ]:
predicted_y_proba = <YOUR CODE: predicted probabilities for input_X using weights>

loss = <YOUR CODE: logistic loss(scalar, mean over sample) between predicted_y_proba and input_y>

train_step = <YOUR CODE: operator that minimizes loss>

In [ ]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

In [ ]:
from sklearn.metrics import roc_auc_score

for i in range(5):

    loss_i, _ = sess.run([loss, train_step], <YOUR CODE: feed values to placeholders>)

    print("loss at iter %i: %.4f" % (i, loss_i))

    print("train auc:", roc_auc_score(
        y_train, sess.run(predicted_y_proba, {input_X: X_train})))
    print("test auc:", roc_auc_score(
        y_test, sess.run(predicted_y_proba, {input_X: X_test})))


print("resulting weights:")
plt.imshow(weights.get_value().reshape(8, -1))
plt.colorbar();

Practice 3: my first tensorflow network

Your ultimate task for this week is to build your first neural network [almost] from scratch and pure tensorflow.

This time you will same digit recognition problem, but at a larger scale

  • images are now 28x28
  • 10 different digits
  • 50k samples

Note that you are not required to build 152-layer monsters here. A 2-layer (one hidden, one output) NN should already have ive you an edge over logistic regression.

[bonus score] If you've already beaten logistic regression with a two-layer net, but enthusiasm still ain't gone, you can try improving the test accuracy even further! The milestones would be 95%/97.5%/98.5% accuraсy on test set.

SPOILER! At the end of the notebook you will find a few tips and frequently made mistakes. If you feel enough might to shoot yourself in the foot without external assistance, we encourage you to do so, but if you encounter any unsurpassable issues, please do look there before mailing us.


In [ ]:
from mnist import load_dataset

# [down]loading the original MNIST dataset.
# Please note that you should only train your NN on _train sample,
#  _val can be used to evaluate out-of-sample error, compare models or perform early-stopping
#  _test should be hidden under a rock untill final evaluation... But we both know it is near impossible to catch you evaluating on it.
X_train, y_train, X_val, y_val, X_test, y_test = load_dataset()

print(X_train.shape, y_train.shape)

In [ ]:
plt.imshow(X_train[0, 0])

In [ ]:
<this cell looks as if it wants you to create variables here>

In [ ]:
<you could just as well create a computation graph here - loss, optimizers, all that stuff>

In [ ]:
<this may or may not be a good place to run optimizer in a loop>

In [ ]:
<this may be a perfect cell to write a training & evaluation loop in>

In [ ]:
<predict & evaluate on test here, right? No cheating pls.>








SPOILERS!

Recommended pipeline

  • Adapt logistic regression from previous assignment to classify some number against others (e.g. zero vs nonzero)
  • Generalize it to multiclass logistic regression.
    • Either try to remember lecture 0 or google it.
    • Instead of weight vector you'll have to use matrix (feature_id x class_id)
    • softmax (exp over sum of exps) can implemented manually or as T.nnet.softmax (stable)
    • probably better to use STOCHASTIC gradient descent (minibatch)
      • in which case sample should probably be shuffled (or use random subsamples on each iteration)
  • Add a hidden layer. Now your logistic regression uses hidden neurons instead of inputs.

    • Hidden layer uses the same math as output layer (ex-logistic regression), but uses some nonlinearity (sigmoid) instead of softmax
    • You need to train both layers, not just output layer :)
    • Do not initialize layers with zeros (due to symmetry effects). A gaussian noize with small sigma will do.
    • 50 hidden neurons and a sigmoid nonlinearity will do for a start. Many ways to improve.
    • In ideal casae this totals to 2 .dot's, 1 softmax and 1 sigmoid
    • make sure this neural network works better than logistic regression
  • Now's the time to try improving the network. Consider layers (size, neuron count), nonlinearities, optimization methods, initialization - whatever you want, but please avoid convolutions for now.